pacman::p_load(maptools, sf, raster, spatstat, tmap, tidyverse, funModeling, sfdep)Take-home Exercise 1
1 Introduction
In this take home exercise, we are tasked to apply appropriate spatial point pattern analysus methods to discover the geographical distribution of functional and non-functional waterpoints and their colocations if any in Osun state, Nigeria.
The main tasks of this exercise includes
Exploratory Spatial Data Analaysis
Second-order Spatial Point Pattern Analysis
Spatial Correlation Analysis
1.1 Installing and Loading R packages
We first start off by loading the necessary R packages into our platform.
1.2 Bringing data into platform
1.2.1 Geospatial data
This study will focus of Osun State, Nigeria. The state boundary GIS data of Nigeria can be downloaded from geoBoundaries.
1.2.1.1 Reading geospatial data
Within the geoboundaries data, we choose to use ADM2 data given that we want to investigate the distribution of water pumps within the LGAs in Osun.
geoNGA2 <- st_read(dsn = "data/geospatial", layer = "nga_admbnda_adm2_osgof_20190417") |> st_transform(crs = 26392) |>
arrange(ADM2_EN)Reading layer `nga_admbnda_adm2_osgof_20190417' from data source
`/Users/pengyouyun/youyunpeng/IS415/Take-home_Ex/Take-home_Ex01/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
#check for duplicates
geoNGA2$ADM2_EN[duplicated(geoNGA2$ADM2_EN) == TRUE][1] "Bassa" "Ifelodun" "Irepodun" "Nasarawa" "Obi" "Surulere"
duplicated_LGA <- geoNGA2$ADM2_EN[duplicated(geoNGA2$ADM2_EN) == TRUE]
# Get all the indices with names that are included in the duplicated LGA names
duplicated_indices <- which(geoNGA2$ADM2_EN %in% duplicated_LGA)
geoNGA2$ADM2_EN[94] <- "Bassa, Kogi"
geoNGA2$ADM2_EN[95] <- "Bassa, Plateau"
geoNGA2$ADM2_EN[304] <- "Ifelodun, Kwara"
geoNGA2$ADM2_EN[305] <- "Ifelodun, Osun"
geoNGA2$ADM2_EN[355] <- "Irepodun, Kwara"
geoNGA2$ADM2_EN[356] <- "Irepodun, Osun"
geoNGA2$ADM2_EN[519] <- "Nasarawa, Kano"
geoNGA2$ADM2_EN[520] <- "Nasarawa, Nasarawa"
geoNGA2$ADM2_EN[546] <- "Obi, Benue"
geoNGA2$ADM2_EN[547] <- "Obi, Nasarawa"
geoNGA2$ADM2_EN[693] <- "Surulere, Lagos"
geoNGA2$ADM2_EN[694] <- "Surulere, Oyo"The code chunk below filters out geoNGA2 into Osun state Local Government Area
osun_LGA <- c("Aiyedade","Aiyedire","Atakumosa East", "Atakumosa West",
"Ede North", "Ede South", "Egbedore", "Ejigbo", "Ife Central",
"Ife East", "Ife North", "Ife South", "Ifedayo", "Ila",
"Ifelodun, Osun","Irepodun, Osun","Ilesha East", "Ilesha West",
"Irewole", "Isokan", "Iwo", "Obokun", "Odo-Otin", "Ola-oluwa",
"Olorunda", "Oriade", "Orolu", "Osogbo", "Boripe", "Boluwaduro")
bd <- geoNGA2 |>
filter(ADM2_EN %in% osun_LGA) #create border sf that filters out Osun LGAs
qtm(bd) #checking if the border data is correctly filtered
1.2.2 Aspatial data
For the purpose of this assignment, data from WPdx Global Data Repositories will be used.
1.2.2.1 Reading aspatial data
Again, in the waterpoint data can be narrowed down to only osun state.
wp_osun <- read_csv("data/aspatial/WPdx.csv") %>%
filter(`#clean_country_name` == "Nigeria", `#clean_adm1`=="Osun")With our packages and data in place, we can now start with our analysis!
2 Exploratory Spatial Data Analysis
In deriving kernel density maps of functional and non-functional water points, we need to under-go two main processes, namely data conversion from sf to ppp for spatstat, and using spatstat functions to create a KDE object. We first start with data conversion from sf to ppp.
2.1 Data conversion from sf to ppp
2.1.1 Converting water point data into sf point features
First we need to convert the wkt field into sfc field by using st_as_sfc() data type. Next we will convert the tibble data.frame into an sf object by isomh st_sf(). it is also important for us to include the referencing system of the data into the sf object. In this case, it has the CRS of WGS 84, so we set the crs to EPSG code 4326.
wp_osun$Geometry = st_as_sfc(wp_osun$`New Georeferenced Column`)
wp_osun# A tibble: 5,557 × 71
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
<dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 429123 GRID3 8.02 5.06 08/29/… Unknown <NA> <NA> Tapsta…
2 70566 Federal Minis… 7.32 4.79 05/11/… No Protec… Well Mechan…
3 70578 Federal Minis… 7.76 4.56 05/11/… No Boreho… Well Mechan…
4 66401 Federal Minis… 8.03 4.64 04/30/… No Boreho… Well Mechan…
5 422190 GRID3 7.87 4.88 08/29/… Unknown <NA> <NA> Tapsta…
6 422064 GRID3 7.7 4.89 08/29/… Unknown <NA> <NA> Tapsta…
7 65607 Federal Minis… 7.89 4.71 05/12/… No Boreho… Well Mechan…
8 68989 Federal Minis… 7.51 4.27 05/07/… No Boreho… Well <NA>
9 67708 Federal Minis… 7.48 4.35 04/29/… Yes Boreho… Well Mechan…
10 66419 Federal Minis… 7.63 4.50 05/08/… Yes Boreho… Well Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
wp_sf <- st_sf(wp_osun, crs=4326) #convert to sf, tell R what crs used for projection
wp_sfSimple feature collection with 5557 features and 70 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4.032004 ymin: 7.060309 xmax: 5.06 ymax: 8.061898
Geodetic CRS: WGS 84
# A tibble: 5,557 × 71
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
* <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 429123 GRID3 8.02 5.06 08/29/… Unknown <NA> <NA> Tapsta…
2 70566 Federal Minis… 7.32 4.79 05/11/… No Protec… Well Mechan…
3 70578 Federal Minis… 7.76 4.56 05/11/… No Boreho… Well Mechan…
4 66401 Federal Minis… 8.03 4.64 04/30/… No Boreho… Well Mechan…
5 422190 GRID3 7.87 4.88 08/29/… Unknown <NA> <NA> Tapsta…
6 422064 GRID3 7.7 4.89 08/29/… Unknown <NA> <NA> Tapsta…
7 65607 Federal Minis… 7.89 4.71 05/12/… No Boreho… Well Mechan…
8 68989 Federal Minis… 7.51 4.27 05/07/… No Boreho… Well <NA>
9 67708 Federal Minis… 7.48 4.35 04/29/… Yes Boreho… Well Mechan…
10 66419 Federal Minis… 7.63 4.50 05/08/… Yes Boreho… Well Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
However, we want to convert the coordinate reference system to Nigeria’s projected coordinate system. We use st_transform(), and include the EPSG code for Nigeria’s projected coordinate system: 26392.
wp_sf <- wp_sf %>%
st_transform(crs = 26392)
qtm(wp_sf) #quick view
2.1.2 Data wrangling for waterpoint data
In cleaning the waterpoint data, we first rename the column from #status_clean to status_clean for easier handling in subsequent steps. select() of dplyr is used to include status_clean in the output sf data.frame. - mutate() and replace_na() are used to recode all the NA values in status_clean into unknown.
wp_sf_nga <- wp_sf |>
rename(status_clean = '#status_clean') |>
select(status_clean) |>
mutate(status_clean = replace_na(
status_clean, "unknown"
))2.1.3 Extracting water point data
Now we are ready to extract the water point data according to their status.
The code chunk below is used to extract functional water points.
wp_functional <- wp_sf_nga |>
filter(status_clean %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair"))The code chunk below is used to extract nonfunctional waterpoint.
wp_nonfunctional <- wp_sf_nga |>
filter(status_clean %in%
c("Abandoned/Decommissioned",
"Abandoned",
"Non-Functional due to dry season",
"Non-Functional",
"Non functional due to dry season"))The code chunk below is used to extract water point with unknown status.
wp_unknown <- wp_sf_nga |>
filter(status_clean %in%
c("unknown"))Next, the code chunk below is used to perform a quick EDA on the derived sf data.frames.
freq(data = wp_functional,
input = 'status_clean')
status_clean frequency percentage cumulative_perc
1 Functional 2319 88.17 88.17
2 Functional but needs repair 248 9.43 97.60
3 Functional but not in use 63 2.40 100.00
freq(data = wp_nonfunctional,
input = 'status_clean')
status_clean frequency percentage cumulative_perc
1 Non-Functional 2008 92.15 92.15
2 Non-Functional due to dry season 151 6.93 99.08
3 Abandoned 15 0.69 99.77
4 Abandoned/Decommissioned 5 0.23 100.00
2.2 Data Wrangling to Prepare Data for Spatstat
2.2.1 Converting sf data frames to sp’s Spatial Class
The code chunk below uses as_Spatial(). of sf package to convert the three geospatial data from simple feature data frame to sp’s Spatial* class.
NGA_bd_sc<-as(bd, "Spatial")
NGA_bd_scclass : SpatialPolygonsDataFrame
features : 30
extent : 176503.2, 291043.8, 331434.7, 454520.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 16
names : Shape_Leng, Shape_Area, ADM2_EN, ADM2_PCODE, ADM2_REF, ADM2ALT1EN, ADM2ALT2EN, ADM1_EN, ADM1_PCODE, ADM0_EN, ADM0_PCODE, date, validOn, validTo, SD_EN, ...
min values : 0.26445678806, 0.00248649736648, Aiyedade, NG030001, Aiyedade, NA, NA, Osun, NG030, Nigeria, NG, 17134, 18003, NA, Osun Central, ...
max values : 1.8470166597, 0.0737271661922, Osogbo, NG030030, Osogbo, NA, NA, Osun, NG030, Nigeria, NG, 17134, 18003, NA, Osun West, ...
func<-as(wp_functional, "Spatial")
funcclass : SpatialPointsDataFrame
features : 2630
extent : 177285.9, 290751, 343128.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 1
names : status_clean
min values : Functional
max values : Functional but not in use
nonfunc<-as(wp_nonfunctional, "Spatial")
nonfuncclass : SpatialPointsDataFrame
features : 2179
extent : 180539, 290616, 340054.1, 450780.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 1
names : status_clean
min values : Abandoned
max values : Non-Functional due to dry season
2.2.2 Converting the Spatial* class into generic sp format
spatstat requires the analytical data in ppp object form. There is no direct way to convert a Spatial* classes into ppp object. We need to convert the Spatial classes* into Spatial object first.
The codes chunk below converts the Spatial* classes into generic sp objects.
NGA_bd_sp <- as(NGA_bd_sc, "SpatialPolygons")
funcsp<-as(func, "SpatialPoints")
nonfuncsp<-as(nonfunc, "SpatialPoints")2.2.3 Convert generic sp format into spatstat’s ppp format
funcppp<-as(funcsp, "ppp")
nonfuncppp<-as(nonfuncsp,"ppp")2.2.4 Checking for Duplicated Points
We can check for the duplication in a ppp object by using the code chunk below. We see that there are no duplicated points.
any(duplicated(funcppp))[1] FALSE
any(duplicated(nonfuncppp))[1] FALSE
2.2.5 Creating Owin Object
To confine our analysis with the geographical area Osun, we create an object called owin in spatstat to represent the polygonal region.
The code chunk below is used to convert NGA_bd_sp into owin object of spatstat.
NGA_owin<- as(NGA_bd_sp, "owin")2.2.6 Combine point events object and owin object
In this last step of geospatial data wrangling, we will extract the waterpoints that are located within Osun by using the code chunk below.
#funcppp
funcppp1<-funcppp[NGA_owin]
summary(funcppp1)Planar point pattern: 2529 points
Average intensity 2.928471e-07 points per square unit
Coordinates are given to 2 decimal places
i.e. rounded to the nearest multiple of 0.01 units
Window: polygonal boundary
30 separate polygons (no holes)
vertices area relative.area
polygon 1 204 766084000 0.08870
polygon 2 81 304399000 0.03520
polygon 3 97 465688000 0.05390
polygon 4 124 373051000 0.04320
polygon 5 60 149473000 0.01730
polygon 6 84 144820000 0.01680
polygon 7 50 102243000 0.01180
polygon 8 72 216002000 0.02500
polygon 9 112 269897000 0.03130
polygon 10 125 365142000 0.04230
polygon 11 83 111191000 0.01290
polygon 12 126 192557000 0.02230
polygon 13 219 904397000 0.10500
polygon 14 174 741131000 0.08580
polygon 15 81 138742000 0.01610
polygon 16 65 119452000 0.01380
polygon 17 90 280205000 0.03240
polygon 18 69 69814600 0.00808
polygon 19 69 42727500 0.00495
polygon 20 49 30458800 0.00353
polygon 21 62 263505000 0.03050
polygon 22 93 438930000 0.05080
polygon 23 87 274127000 0.03170
polygon 24 105 509979000 0.05910
polygon 25 98 292058000 0.03380
polygon 26 64 327765000 0.03800
polygon 27 133 108945000 0.01260
polygon 28 122 462169000 0.05350
polygon 29 94 109715000 0.01270
polygon 30 95 61239800 0.00709
enclosing rectangle: [176503.22, 291043.82] x [331434.7, 454520.1] units
(114500 x 123100 units)
Window area = 8635910000 square units
Fraction of frame area: 0.613
plot(funcppp1)
#nonfuncppp
nonfuncppp1<-nonfuncppp[NGA_owin]
summary(nonfuncppp1)Planar point pattern: 2059 points
Average intensity 2.384232e-07 points per square unit
Coordinates are given to 2 decimal places
i.e. rounded to the nearest multiple of 0.01 units
Window: polygonal boundary
30 separate polygons (no holes)
vertices area relative.area
polygon 1 204 766084000 0.08870
polygon 2 81 304399000 0.03520
polygon 3 97 465688000 0.05390
polygon 4 124 373051000 0.04320
polygon 5 60 149473000 0.01730
polygon 6 84 144820000 0.01680
polygon 7 50 102243000 0.01180
polygon 8 72 216002000 0.02500
polygon 9 112 269897000 0.03130
polygon 10 125 365142000 0.04230
polygon 11 83 111191000 0.01290
polygon 12 126 192557000 0.02230
polygon 13 219 904397000 0.10500
polygon 14 174 741131000 0.08580
polygon 15 81 138742000 0.01610
polygon 16 65 119452000 0.01380
polygon 17 90 280205000 0.03240
polygon 18 69 69814600 0.00808
polygon 19 69 42727500 0.00495
polygon 20 49 30458800 0.00353
polygon 21 62 263505000 0.03050
polygon 22 93 438930000 0.05080
polygon 23 87 274127000 0.03170
polygon 24 105 509979000 0.05910
polygon 25 98 292058000 0.03380
polygon 26 64 327765000 0.03800
polygon 27 133 108945000 0.01260
polygon 28 122 462169000 0.05350
polygon 29 94 109715000 0.01270
polygon 30 95 61239800 0.00709
enclosing rectangle: [176503.22, 291043.82] x [331434.7, 454520.1] units
(114500 x 123100 units)
Window area = 8635910000 square units
Fraction of frame area: 0.613
plot(nonfuncppp1)
2.2.7 Rescale ppp data into km
In the code chunk below, rescale() is used to convert the unit of measurement from meter to kilometer
funcppp1.km<-rescale(funcppp1, 1000, "km")
nonfuncppp1.km<-rescale(nonfuncppp1, 1000, "km")With our data cleaned and in thr right format, we can move on to compute the Kernel Density Estimation!
2.3 First-order Spatial Point Pattern Analysis
2.3.1 Computing kernel density Estimation using adaptive bandwidth selection method
In spatial analysis, the choice of bandwidth is important for determining the smoothness of a surface fit to the data. The bandwidth determines the width of the smoothing kernel used in spatial smoothing techniques like kernel density estimation or local regression. We compare between two types of bandwidth methods, fixed and adaptive bandwidth, to be applied to this situation.
Fixed bandwidth methods use a constant value for the bandwidth throughout the analysis, regardless of the distribution of the data. This can be useful when the data has a consistent structure, but if the data is highly variable or has multiple modes, a constant bandwidth may not provide an adequate fit to the data.
Adaptive bandwidth methods, on the other hand, use a variable bandwidth that adjusts based on the local structure of the data. This allows for more flexibility in the fitting process.
From our initial plots, we can see that there is some evidence of clustering, leading to our choice of using adaptive bandwidth in our analysis.
funckde_adaptive<- adaptive.density(funcppp1.km, method="kernel")
nonfunckde_adaptive<- adaptive.density(nonfuncppp1.km, method="kernel")
par(mfrow=c(1,2))
plot(funckde_adaptive, main="Functional Waterpoint KDE using Adaptive Bandwidth")
plot(nonfunckde_adaptive, main="Non-Functional Waterpoint KDE using Adaptive Bandwidth")
2.3.2 Convert KDE into grid object
Converting KDE into a gridded object that is suitable for mapping purposes.
gridded_kde_funckde <- as.SpatialGridDataFrame.im(funckde_adaptive)
gridded_kde_nonfunckde <- as.SpatialGridDataFrame.im(nonfunckde_adaptive)2.3.3 Convert gridded output into raster
Next we convert the gridded kernal density objects into RasterLayer objet using raster() of raster package.
funckde_raster<-raster(gridded_kde_funckde)
funckde_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : 1.320603e-16, 23.9989 (min, max)
nonfunckde_raster<-raster(gridded_kde_nonfunckde)
nonfunckde_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : 6.85125e-17, 20.92404 (min, max)
Notice that the crs is NA
2.3.4 Assigning Projection Systems
The code chunk below is used to include the CRS information funckde_raster and nonfunckde_raster.
projection(funckde_raster) <- CRS("+init=EPSG:26392 +units=km")
funckde_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +units=km +no_defs
source : memory
names : v
values : 1.320603e-16, 23.9989 (min, max)
res(funckde_raster)[1] 0.8948485 0.9616045
projection(nonfunckde_raster) <- CRS("+init=EPSG:26392 +units=km")
nonfunckde_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +units=km +no_defs
source : memory
names : v
values : 6.85125e-17, 20.92404 (min, max)
res(nonfunckde_raster)[1] 0.8948485 0.9616045
2.3.5 Viewing object in tmap
Finally, we will display the raster in cartographic quality map using tmap package.
tmap_mode("view")
tm_basemap(server ="OpenStreetMap")+
tm_shape(funckde_raster) +
tm_raster("v") +
tm_layout(main.title = "Raster Plot of Functional Waterpoint KDE",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)tmap_mode("view")
tm_basemap(server ="OpenStreetMap")+
tm_shape(nonfunckde_raster) +
tm_raster("v") +
tm_layout(main.title = "Raster Plot of Non-Functional Waterpoint KDE",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)Change tmap mode to plot.
tmap_mode("plot")3 Second-order Spatial Point Patterns Analysis
For the purposes of Spatial Point Analysis, we are attempting to determine a pattern in terms of either clustering or standardization of point spread. However, in order to do so, we must first reject the null hypothesis that they are randomly distributed to arrive at any conclusion. Thus:
The test hypotheses are:
Ho = The distribution of Functional/Non Functional Waterpoints are randomly distributed.
H 1= The distribution of Functional/Non Functional Waterpoints services are not randomly distributed.
We will set a 95% confidence interval for the purpose of this study.
3.0.1 Analysing Spatial Point Process Using K-Function
- Computing K-function estimate In our analysis, we choose to use the K function illustrates how the spatial clustering or dispersion of feature centroid changes when the neighborhood size changes.
Referencing this website, when the observed K value is larger than the expected K value for a particular distance, the distribution is more clustered than a random distribution at that distance. When the observed K value is smaller than the expected K value, the distribution is more dispersed than a random distribution at that distance.
K_Fun <- Kest(funcppp1, correction = "Ripley")
K_NonFun <- Kest(nonfuncppp1, correction = "Ripley")
par(mfrow=c(1,2))
plot(K_Fun, .-r ~r, ylab="K(d)-r", xlab ="d(m)")
plot(K_NonFun, .-r ~r, ylab="L(d)-r", xlab ="d(m)")
- Generating Monte Carlo test with K-Function
K_Fun.csr <- envelope(funcppp1, Kest, nsim = 39, rank = 1, glocal = TRUE)Generating 39 simulations of CSR ...
1, 2, [etd 1:49:01] 3, [etd 1:43:44] 4,
[etd 1:41:02] 5, [etd 1:37:17] 6, [etd 1:35:52] 7, [etd 1:32:42] 8,
[etd 1:29:19] 9, [etd 1:26:32] 10, [etd 1:23:48] 11, [etd 1:20:36] 12,
[etd 1:19:02] 13, [etd 1:16:52] 14, [etd 1:14:14] 15, [etd 1:11:14] 16,
[etd 1:08:09] 17, [etd 1:05:13] 18, [etd 1:02:35] 19, [etd 59:46] 20,
[etd 56:47] 21, [etd 53:51] 22, [etd 50:53] 23, [etd 47:46] 24,
[etd 44:39] 25, [etd 41:40] 26, [etd 38:37] 27, [etd 35:35] 28,
[etd 32:31] 29, [etd 29:30] 30, [etd 26:26] 31, [etd 23:31] 32,
[etd 20:32] 33, [etd 17:35] 34, [etd 14:38] 35, [etd 11:40] 36,
[etd 8:44] 37, [etd 5:49] 38, [etd 2:54] 39.
Done.
K_NonFun.csr <- envelope(nonfuncppp1, Kest, nsim = 39, rank = 1, glocal = TRUE)Generating 39 simulations of CSR ...
1, 2, [etd 1:03:19] 3, [etd 1:01:27] 4,
[etd 1:02:56] 5, [etd 1:01:13] 6, [etd 58:53] 7, [etd 57:23] 8,
[etd 55:57] 9, [etd 54:08] 10, [etd 52:12] 11, [etd 50:30] 12,
[etd 48:41] 13, [etd 46:46] 14, [etd 44:49] 15, [etd 42:53] 16,
[etd 41:06] 17, [etd 39:38] 18, [etd 37:57] 19, [etd 36:21] 20,
[etd 34:43] 21, [etd 33:03] 22, [etd 45:44] 23, [etd 42:20] 24,
[etd 39:39] 25, [etd 37:22] 26, [etd 34:57] 27, [etd 32:20] 28,
[etd 29:50] 29, [etd 27:16] 30, [etd 39:26] 31, [etd 43:28] 32,
[etd 48:37] 33, [etd 40:43] 34, [etd 33:11] 35, [etd 25:59] 36,
[etd 19:05] 37, [etd 12:28] 38, [etd 6:07] 39.
Done.
- plot
To interpret the K function simulation results, we plot out the simulations as a confidence interval, looking at the observed spatial pattern, expected spatial pattern and confidence interval.
summary(K_Fun.csr)Pointwise critical envelopes for K(r)
and observed value for 'funcppp1'
Obtained from 39 simulations of CSR
Alternative: two.sided
Upper envelope: pointwise maximum of simulated curves
Lower envelope: pointwise minimum of simulated curves
Significance level of Monte Carlo test: 2/40 = 0.05
Data: funcppp1
summary(K_NonFun.csr)Pointwise critical envelopes for K(r)
and observed value for 'nonfuncppp1'
Obtained from 39 simulations of CSR
Alternative: two.sided
Upper envelope: pointwise maximum of simulated curves
Lower envelope: pointwise minimum of simulated curves
Significance level of Monte Carlo test: 2/40 = 0.05
Data: nonfuncppp1
par(mfrow=c(1,2))
plot(K_Fun.csr, main="Functional Waterpoint K Simulations")
plot(K_NonFun.csr, main="Non-functional Waterpoint K Simulations")
From the graph above, we see that the observed line falls above that of the confidence interval for both functional and non-functional waterpoints. We reject the null hypothesis at the 95% level of significance that the distribution of functional and non functional waterpoints are randomly distributed. From the information gathered earlier, we see that given the observed K values are higher than expected, it is likely that the functional water points are clustered around each other. A similar conclusion can be drawn for non-functional waterpoints.
3.0.2 Performing Clark-Evans test
Referencing this website, we want to use the Clark-Evans test to reinforce our conclusions. The Clark and Evans (1954) aggregation index R is a crude measure of clustering or ordering of a point pattern. It is the ratio of the observed mean nearest neighbour distance in the pattern to that expected for a Poisson point process of the same intensity. A value R>1 suggests ordering, while R<1 suggests clustering.
clarkevans.test(funcppp1,
correction="none",
clipregion="NGA_owin",
alternative=c("two.sided"),
nsim=39)
Clark-Evans test
No edge correction
Monte Carlo test based on 39 simulations of CSR with fixed n
data: funcppp1
R = 0.44265, p-value = 0.05
alternative hypothesis: two-sided
3.0.3 Interpretation of the Clark-Evans test
Based on the Clark Evans test with 39 simulations, we find that the index R value is 0.44265, which is less than 1, indicating evidence of clustering. However, as the p-value is 0.05, we are unable to reject the null hypothesis.
3.0.4 Statistical conclusion
To conclude, there is evidence of clustering of the functional and non-functional water points using the K-function and the Clark and Evans test. However, the evidence could be weak. A more definitive answer could be obtained if we had tried to use more simulations in the test.
4 Spatial Correlation Analysis
In this section, we seek to confirm statistically if the spatial distribution of functional and non-functional waterpoints are independent from each other.
4.1 Data Wrangling
We first start with a plot of the distribution of the waterpoints using wp_sf_nga and bd objects defined earlier.
tmap_mode("view")
tm_shape(bd)+
tm_polygons()+
tm_shape(wp_sf_nga)+
tm_dots(col="status_clean")#streamlining data into functional and non-functional waterpoints onlyWe realise that under the column “status_clean”, there are too many categories, which can be difficult for interpretation especially when we want to calculate colocation quotients. We will conduct data wrangling in the next step to define the waterpoints distinctly into “functional” and “non-functional”.
nonfunc_df<-wp_sf_nga |>
filter(status_clean %in%
c("Abandoned/Decommissioned",
"Abandoned",
"Non-Functional due to dry season",
"Non-Functional",
"Non functional due to dry season")) |>
mutate(status_redefined="Non-functional")
func_df<-wp_sf_nga |>
filter(status_clean %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair")) |>
mutate(status_redefined="Functional")
#creating a status_redefined column that states if water point is either functional or non functional
df<-bind_rows(func_df, nonfunc_df) |>
mutate(status_redefined=factor(status_redefined)) |>
select(Geometry, status_redefined)
#combining data frames into 1 dfWith our new dataframe, we continue to plot a graph showing the the functional and non-functional water points using tmap.
tmap_mode("view")
tm_shape(bd)+
tm_polygons()+
tm_shape(df)+
tm_dots(col="status_redefined",
size=0.01,
border.col="black",
border.lwd=0.5)#plotting the combined dataframe4.2 Local Colocation coefficient
According to this website, the colocation analysis tool measures local patterns of spatial association between two categories of point features using the colocation quotient statistic.
Each feature in the Category of Interest (category A) is evaluated individually for colocation with the presence of the Neighboring Category (category B) found within its neighborhood. In general, if the proportion of B points within the neighborhood of A is more than the global proportion of B, the colocation quotient will be high. If the neighborhood of A contains many other A points or many other categories other than B, the colocation between the Category of Interest (category A) and the Neighboring Category (category B) will be small.
In our analysis, our category of interest (A) is functional waterpoints, and neighboring category (B) is non-functional waterpoints.
4.2.1 Preparing the vector list
FW<-df |>
filter(status_redefined == "Functional")
A<- FW$status_redefined
NFW<-df |>
filter(status_redefined == "Non-functional")
B<- NFW$status_redefined4.2.2 Preparing nearest neighbour list
In the code chunk below, st_knn() of sfdep package is used to determine the k (i.e. 6) nearest neighbours for given point geometry.
nb<-include_self(
st_knn(st_geometry(df), 6)
)4.2.3 Computing Kernel Weights
In the code chunk below, st_kernel_weights() of sfdep package is used to derive a weights list by using a kernel function.
wt<-st_kernel_weights(nb,
df,
"gaussian",
adaptive=TRUE)4.2.4 Computing LCLQ
In the code chunk below local_colocation() us used to compute the LCLQ values for each Water point event.
LCLQ<-local_colocation(A, B, nb, wt, 39)4.2.5 Joining output Table
Before we can plot the LCLQ values their p-values, we need to join the output of local_colocation() to the stores sf data.frame. However, a quick check of LCLQ data-frame, we can’t find any field can be used as the join field. As a result, cbind() of Base R is useed.
LCLQ_WP<-cbind(df,LCLQ)4.2.6 Plotting LCLQ values
In the code chunk below, tmap functions are used to plot the LCLQ analysis.
#plot the graph
tmap_mode("view")
tm_shape(bd) +
tm_polygons() +
tm_shape(LCLQ_WP) +
tm_dots(col="Non.functional")+
tm_view(set.zoom.limits = c(9,13))+
tm_shape(LCLQ_WP) +
tm_dots(col="p_sim_Non.functional")+
tm_view(set.zoom.limits = c(9,13))4.2.7 Statistical conclusion
From the statistical table, we see that the colocation coefficient is less than 1 but extremely close to one for some points.
From this website, features that have colocation quotients less than one are less likely to have category B within their neighborhood. If a feature has a colocation quotient equal to one, it means the proportion of categories within their neighborhood is a good representation of the proportion of categories throughout the entire study area.
Therefore, it is likely that there is some correlation between location of functional and non functional water points. Additionally given that the p-value is less than 0.05 for the selected points, we can say that the result is statistically significant, and that Functional and non-functional water points are dependent with each other.
tmap_mode("plot")4.3 Performing appropriate tests using second order spatial point pattern analysis technique
We will use the Cross-K Function to look into this relationship
The test hypotheses are:
Ho = The distribution of functional and non-functional waterpoints are spatially independent from each other (ie randomly distributed).
H 1= The ditribution of functional and non-functional waterpoints are not independent from each other (ie not randomly distributed).
We will set a 95% confidence interval for the purpose of this study.
4.3.1 Conversion of LCLQ data into ppp
In this analysis, we seek to perform marked point pattern analysis, based on the associated categorical measurement “status_redefined” in the waterpoint data.
df_spatialpoint<-df |>
as("Spatial") |>
as("SpatialPointsDataFrame") #creating spatial point data frame from sf
df_spatialpoint@data$status_redefined<-as.factor(df_spatialpoint@data$status_redefined) #creating a factor column for status_redefined
df_ppp<-df_spatialpoint |>
as("ppp") #converting the spatial data frame into a ppp object
df_ppp_owin<-df_ppp[NGA_owin] #creating an owin object
plot(df_ppp_owin, main = "df_ppp", which.marks = "status_redefined") #creating a quick plot to visualise the ppp object
4.3.2 Using Cross K function to check for distribution trend
We use the cross-K function to analyse the trend of distribution of both functional and non-functional waterpoints.
Lcross.csr <- envelope(df_ppp_owin,
Lcross,
i="Functional",
j="Non-functional",
correction="border",
nsim=39)Generating 39 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
Done.
We can then plot our result
plot(Lcross.csr,
xlim = c(0,10000))
4.3.3 Statistical conclusions
From the graph above, we can conclude that there is evidence of spatial dependence between 0 to 5000, and 7000 to 8000 r values. More specifically, between 0 to 5000, functional and non-functional waterpoints tend to cluster, while between 7000 to 8000, they tend to be evenly distributed.
5 Self-sourced: Comparing population and waterpoint functionality
In this section we aim to come up with useful conclusions regarding the waterpoint trends observed. Specifically, we compare the distribution of clustering of functional and non functional waterpoints with the population spread of Osun, to see if there is any correlation between both factors.
5.1 Importing data
In researching for population data for Osun state, we chose to use population density from this website.
We save the webpage as a html file, and open it using microsoft excel, to generate the table in xls format. Subsequently, we save the excel file into csv format for analysis in R. The file will be saved with the name “pop_data_nga.csv”.
pop_data<-read.csv("data/pop_data_nga.csv")
osun_pop <- pop_data %>%
rename(shapeName = `Local.gov..area.`,
HASC = `HASC....`,
Capital = `Capital.....`,
Population = `Population....`,
State = `State....`) |>
filter(State == "Osun")To plot the population data, we use ADM2 data which defines the specific geoboundaries of LGAs in states.
#Joining data from both data frames, preserving sf properties
geoNGA2_osun<-bd |>
left_join(osun_pop, by=c("ADM2_EN"="shapeName"))5.2 Plotting waterpoint data and population data
With our data ready, we can plot the projected population using tmap functions.
tm_shape(geoNGA2_osun)+
tm_fill("Density",
style = "quantile",
palette = "-Blues",
title = "Population Density of Osun LGAs")+
tm_layout(main.title = "Population Density of Osun LGAs",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_scale_bar() +
tm_grid(alpha =0.2)+
tm_shape(filter(df,status_redefined=="Functional"))+
tm_dots(col="green",
size=0.01,
border.col="black",
border.lwd=0.5,
alpha=0.5)+
tm_shape(filter(df,status_redefined=="Non-functional"))+
tm_dots(col="red",
size=0.01,
border.col="black",
border.lwd=0.5,
alpha=0.5)
Change tmap_mode to plot
tmap_mode("plot")5.3 Conclusions
From the plot, we see that high population density areas such as Ife Central, Ede North, Boripe, Ilesha West and Orolu have high coincidence of non-functional waterpoints. This could possibly imply the overuse of waterpoints by the larger population. More importantly, this finding can be used by government agencies who are pioritising repair schedules for the non-functional waterpoints.